Speech Recognition with Word Fragment Detection Using Prosody Features for Spontaneous Speech
نویسندگان
چکیده
This investment proposed a novel approach for word fragment detection with prosody features for spontaneous speech recognition. Incomplete pronunciation of word result in ill-form fragment in word-building that causes the performance of language model in speech recognition is dramatically decreased. Instead of lexical word, prosody word is used to be building block for spontaneous speech processing recently. Prosody features are further extracted from prosody word and fed into the decision tree to judge the prosody word is complete word or word fragment. There are three categories feature sets are employed here: pitch related, intensity related, and duration related features are included. For evaluating the proposed method, the Hidden Markov models (HMMs) based speech recognition core was developed to be the baseline. The proposed method is integrated into the baseline to provide the word fragment detection capability and enrich the performance of spontaneous speech recognition. According to the experimental results, the performance of proposed method outperforms traditional speech recognition especially in insertion and deletion error. This shows that the word fragment detection can obtain the improvement for spontaneous speech recognition.
منابع مشابه
A prosody only decision-tree model for disfluency detection
Speech disfluencies (filled pauses, repetitions, repairs, and false starts) are pervasive in spontaneous speech. The ability to detect and correct disfluencies automatically is important for effective natural language understanding, as well as to improve speech models in general. Previous approaches to disfluency detection have relied heavily on lexical information, which makes them less applic...
متن کاملSyllable detection in read and spontaneous speech
Automatic syllable detection is an important task when analysing very large speech corpora in order to answer questions concerning prosody, rhythm, speech rate, speech recognition and synthesis. In this paper a new method for automatic detection of syllable nuclei is presented. Two large spoken language corpora (PhonDatII, Verbmobil) were labelled by three phoneticians and then used to adjust t...
متن کاملA comparative study of HMM-based approaches for the automatic recognition of perceptually relevant aspects of spontaneous German speech melody
Three approaches to the speaker independent automatic recognition of melodic aspects of spontaneous German are presented. All systems are based on Hidden Markov Models. Their input is restricted to the speech signal from which a feature extraction component derives eleven prosodic features. No additional information { as commonly used for prosody recognition { like word chains, word hypotheses,...
متن کاملTowards using Prosody In Speech Recognition/Understanding Systems: Differences Between Read and Spontaneous Speech
A persistent problem for keyword-driven speech recognition systems is that users often embed the to-be-recognized words or phrases in longer utterances. The recognizer needs to locate the relevant sections of the speech signal and ignore extraneous words. Prosody might provide an extra source of information to help locate target words embedded in other speech. In this paper we examine some pros...
متن کاملDovetailing of acoustics and prosody in spontaneous speech recognition
Prosody can be applied to improve the performance of spontaneous speech translation systems like VERBMOBIL. In VERBMOBIL we previously augmented the output of a word recognizer with prosodic information. Here we present a new approach of interleaving word recognition and prosodic processing. While we still use the output of a word recognizer to determine phrase boundaries, we do not wait until ...
متن کامل